Added erf(x) Float64 and Float32 Julia implementations by AhmedYKadah · Pull Request #491 · JuliaMath/SpecialFunctions.jl

AhmedYKadah · 2025-03-31T15:47:05Z

Faster than current wrapper function call (including Float32 function call).
Uses algorithm based on https://github.com/ARM-software/optimized-routines/blob/master/math/erf.c

codecov · 2025-03-31T16:29:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.16%. Comparing base (46a2874) to head (f7470ba).
⚠️ Report is 27 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #491      +/-   ##
==========================================
+ Coverage   94.02%   94.16%   +0.13%     
==========================================
  Files          14       14              
  Lines        2897     2965      +68     
==========================================
+ Hits         2724     2792      +68     
  Misses        173      173

Flag	Coverage Δ
unittests	`94.16% <100.00%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

AhmedYKadah · 2025-03-31T17:28:21Z

Old:
Float 64
@benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 217729 samples with 1000 evaluations per sample.
Range (min … max): 6.300 ns … 283.700 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 29.500 ns ┊ GC (median): 0.00%
Time (mean ± σ): 21.993 ns ± 13.209 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32
@benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float32)-3) samples=1000000

BenchmarkTools.Trial: 312732 samples with 1000 evaluations per sample.
Range (min … max): 4.300 ns … 125.100 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.900 ns ┊ GC (median): 0.00%
Time (mean ± σ): 15.035 ns ± 7.951 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

New:
Float64
@benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 507504 samples with 1000 evaluations per sample.
Range (min … max): 5.400 ns … 4.890 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.700 ns ┊ GC (median): 0.00%
Time (mean ± σ): 8.775 ns ± 9.855 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32
@benchmark Float32(erf(data)) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 526797 samples with 1000 evaluations per sample.
Range (min … max): 5.400 ns … 195.500 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.800 ns ┊ GC (median): 0.00%
Time (mean ± σ): 8.521 ns ± 2.236 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

AhmedYKadah · 2025-03-31T17:30:33Z

Float32 implementation available, but not faster than Float64 version due to a exp() call.
Float64 version still faster than old Float32.

AhmedYKadah · 2025-04-05T13:09:02Z

need to clean up polynomial evaluations.
code also could use more organization

AhmedYKadah · 2025-08-02T08:07:59Z

Remaining: erfc Float64 and Float32 implementations, and the erf Float32 implementation

src/erf.jl

…necessary whitespace, and removed explicit copysigns

src/erf.jl

oscardssmith · 2025-09-14T00:59:31Z

src/erf.jl

-
 end

 _erf(x::Float16)=Float16(_erf(Float32(x)))


if you wanted to do a Float16 impl, it should be easier than the others. Specifically, the domain is only to 2, and the accuracy required is much reduced.

100% could wait for a followup PR.

I'm thinking that too to be honest.
this and the poli regen.

oscardssmith · 2025-09-15T03:47:22Z

Given that this is faster and accurate, seems good to merge to me!

mschauer · 2025-09-16T06:30:08Z

Are there any tests for edge cases/ULP in the c version we do not do ourselves?

devmotion

The implementation does not handle NaN32 and NaN16 correctly:

julia> erf(NaN32)
1.0f0

julia> erf(NaN16)
Float16(1.0)

src/erf.jl

Co-authored-by: David Müller-Widmann <devmotion@users.noreply.github.com>

AhmedYKadah · 2025-09-19T09:44:51Z

There aren't any tests for erfc. Is that expected?

AhmedYKadah · 2025-09-19T10:56:44Z

Any other changes needed?

oscardssmith · 2025-09-19T12:57:28Z

we should probably should test erfc.

test/erf.jl

oscardssmith · 2025-11-14T13:21:19Z

Other than missing tests for Inf, looks good to me. @devmotion any further sugestions?

src/erf.jl

devmotion · 2025-11-14T20:48:06Z

What do the benchmark shows with the latest iteration of this PR? In #491 (comment) performance with Float32 seemed to regress.

AhmedYKadah · 2025-11-14T21:56:29Z

Yes, you're correct. The old implementation was not efficient so I redid it.

Old

Float64
@benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 217081 samples with 1000 evaluations per sample.
Range (min … max): 6.400 ns … 973.200 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 30.100 ns ┊ GC (median): 0.00%
Time (mean ± σ): 22.028 ns ± 13.612 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32
@benchmark erf(data) setup=(data=6*rand(Float32)-3) samples=1000000

BenchmarkTools.Trial: 308269 samples with 1000 evaluations per sample.
Range (min … max): 4.300 ns … 160.400 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.900 ns ┊ GC (median): 0.00%
Time (mean ± σ): 15.207 ns ± 8.609 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

New

Float64
@benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 604748 samples with 1000 evaluations per sample.
Range (min … max): 4.700 ns … 164.400 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 7.400 ns ┊ GC (median): 0.00%
Time (mean ± σ): 7.302 ns ± 3.017 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32
@benchmark erf(data) setup=(data=6*rand(Float32)-3) samples=1000000

BenchmarkTools.Trial: 641285 samples with 1000 evaluations per sample.
Range (min … max): 4.200 ns … 204.800 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.900 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.840 ns ± 2.961 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

devmotion

If you update the version number, we could tag a new release immediately after merging the PR.

AhmedYKadah · 2025-11-15T09:20:10Z

Anything left for me to do?

oscardssmith · 2025-11-15T15:13:16Z

yeah, can you bump the patch number (in Project.toml) so that when we merge the PR, we can tag a new version?

Project.toml

Co-authored-by: David Müller-Widmann <devmotion@users.noreply.github.com>

oscardssmith · 2025-11-15T21:15:20Z

Thanks @AhmedYKadah for the Pr and @devmotion for the additional review!

AhmedYKadah · 2025-11-15T21:18:33Z

It's been a pleasure 😄

AhmedYKadah added 5 commits March 31, 2025 17:16

Added erf(x) Float64/Float32 Julia implementation

6f554ef

changed erf to _erf, got rid of unnecessary branch

7f4fd2d

fixed syntax error in ccall

e784c9f

fixed syntax error in ccall 2

da16cb1

NaN edge case for erf(x)

0a755b6

added test cases for erf(x)

6efcec8

AhmedYKadah and others added 2 commits August 2, 2025 10:19

Merge branch 'master' into erf(x)-implementation

0fc6d4d

cleaned up erf(Float64)

3cee8ce

AhmedYKadah added 4 commits September 14, 2025 02:40

added erf(x::Float32) implementation

b819c58

added NaN edge case to erf(x::Float32)

57bbaf2

Merge branch 'master' into erf(x)-implementation

5ad8278

reversed NaN check

26b3b1f

AhmedYKadah changed the title ~~Added erf(x) Float64 Julia implementation~~ Added erf(x) Float64 and Float32 Julia implementations Sep 14, 2025